Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

RBCN: Rectiﬁed Binary Convolutional Networks with Generative Adversarial Learning 65

where ◦is an operator that obtains the pruned weight with mask Mp. The other part of

the forward propagation in the pruned RBCNs is the same as in the RBCNs.

In pruned RBCNs, what needs to be learned and updated are full precision ﬁlters Wp,

learnable matrices Cp, and soft mask Mp. In each convolutional layer, these three sets of

parameters are jointly learned.

Update Mp. Mp is updated by FISTA [141] with the initialization of α(1) = 1. Then

we obtain the following.

α(k+1) = ¹

2^{(1 +}

1 + 4α²

(k)⁾^,

(3.84)

y(k+1) = Mp,(k) + ^a⁽^k⁾⁻¹

a(k+1)

(Mp,(k) −Mp,(k−1)),

(3.85)

Mp,(k+1) = proxη(k+1)λ||·||1 (y(k+1) −ηk+1

∂(LAdv p + LData p)

∂(y(k+1))

(3.86)

where ηk+1 is the learning rate in iteration k + 1 and proxη(k+1)λ||·||1 (zi) = sign(zi) · (|zi| −

η0λ)+, more details can be found in [142].

Update Wp. Let δW l

p,i ^{be the gradient of the full precision ﬁlter}^W^l

p,i^{. During backprop-}

agation, the gradients pass to ^ˆW ^l

p,i ^{ﬁrst and then to}^W^l

p,i^{. Furthermore,}

δW l

p,i ⁼^∂^L^p

∂^ˆW ^l

p,i

= ^∂^L^{S p}

∂^ˆW ^l

p,i

+ ^∂^L^{Adv p}

∂^ˆW ^l

p,i

+ ^∂^L^{Kernel p}

∂^ˆW ^l

p,i

+ ^∂^L^{Data p}

∂^ˆW ^l

p,i

(3.87)

and

W ^l

p,i ^←^W^l

p,i ⁻^η^p,¹^δW ^l

p,i^,

(3.88)

where ηp,1 is the learning rate, ^∂^L^{Kernel p}

∂^ˆ

W ^l

p,i

and ^∂^L^{Adv p}

∂^ˆ

W ^l

p,i

are

∂LKernel p

∂^ˆW ^l

p,i

= −λ1(W ^l

p,i ⁻^C^l

p ^ˆ^W^l

p,i⁾^C^l

p^,

(3.89)

∂LAdv p

∂^ˆW ^l

p,i

= −2(1 −D(T ^l

p,i^;^Y^p⁾⁾^∂D^p

∂^ˆW ^l

p,i

(3.90)

And

∂LData p

∂^ˆW ^l

p,i

= −¹

n⁽^R^p⁻^T^p⁾^∂T^p

∂^ˆW ^l

p,i

(3.91)

Update Cp. We further update the learnable matrix C^l

p ^with^W^l

p ^and^M^l

p ^{ﬁxed. Let}^δC^lp

be the gradient of C^l

p^{. Then we have}

δCl

p ⁼^∂^L^p

∂^ˆC^lp

= ^∂^L^{S p}

∂^ˆC^lp

+ ^∂^L^{Adv p}

∂^ˆC^lp

+ ^∂^L^{Kernel p}

∂^ˆC^lp

+ ^∂^L^{Data p}

∂^ˆC^lp

(3.92)

and

C^l

p ^←^C^l

p ⁻^η^p,²^δC^l

p^.

(3.93)

and ^∂^L^{Kernel p}

∂C^l

and ^∂^L^{Adv p}

∂C^l

are

∂LKernel p

∂C^lp

= −λ1

(W ^l

p,i ⁻^C^l

p ^ˆ^W^l

p,i^{) ˆ}^W^l

p,i^,

(3.94)